Importation Plots#
This page contains instructions and documentation for creating Sankey diagrams, relative risk charts, and area plots using importation data.
area_plot#
Area plots are designed to show a change in comparable values over time; they compare apples to apples. In this example, we use H1N1 predictions from 2010 and show the origin countries for cases travelling from Asia to Europe.
Parameters#
client (bigquery.Client): BigQuery client object.
table_name (str): BigQuery table name containing importation data in ‘dataset.table’ form.
reference_table (str): BigQuery table name containing reference table in ‘dataset.table’ form.
source_geo_level (str): The name of a column from the reference table. The geographical level used to determine what sources are included.
target_geo_level (str): The name of a column from the reference table. The geographical level used to determine what targets are included.
source_values (str or listlike or None, optional): The source(s) to be included. A value or subset of values from the source_geo_level column. If None, then all values will be included. Defaults to None.
target_values (str or listlike or None, optional): The target(s) to be included. A value or subset of values from the target_geo_level column. If None, then all values will be included. Default to None.
source_column (str, optional): Name of column in original table containing source identifier. Defaults to ‘source_basin’.
target_column (str, optional): Name of column in original table containing target identifier. Defaults to ‘target_basin’.
reference_column (str, optional): Name of column in original table containing the geography corresponding to data in source_column and target_column. Defaults to ‘basin_id’.
output_resolution (str or None, optional): The name of a column from the reference table. If None, then target_geo_level will be used. Desired geographical resolution for area plot. Defaults to None.
domestic (bool, optional): Whether or not domestic cases will be included. Defaults to True. Note: can produce unexpected results.
cutoff (float, optional): From 0 to 1, inclusive. All sources or targets that contribute below this percentage of cases will be grouped into an ‘Other’ category. Set to 0 for no ‘Other’ category. Defaults to 0.05.
value (str, optional): Name of column in the original table containing the importation value to be analyzed. Defaults to ‘importations’.
display (str, optional): Whether the source or target of the importations will be visualized. Defaults to ‘source’.
Returns#
fig (plotly.graph_objects.Figure): Plotly Figure containing visualization.
Example#
import epidemic_intelligence as ei
from google.oauth2 import service_account
from google.cloud import bigquery
credentials = service_account.Credentials.from_service_account_file('../../../credentials.json') # use the path to your credentials
project = 'net-data-viz-handbook' # use your project name
# Initialize a GC client
client = bigquery.Client(credentials=credentials, project=project)
table_name = 'importation_data.h1n1_proper_simple_agg'
reference_table = 'reference.gleam-geo-map'
source_geo_level = 'continent_label' # Geographic level for source filtering
target_geo_level = 'continent_label' # Geographic level for target filtering
output_resolution = 'country_name' # Geographic level for output
source_values = ['Asia']
target_values = ['Europe']
domestic = False
cutoff = 0.05
display='source'
ap_fig = ei.area_plot(client=client,
table_name=table_name,
reference_table=reference_table,
source_geo_level=source_geo_level,
target_geo_level=target_geo_level,
output_resolution=output_resolution,
source_values=source_values,
target_values=target_values,
domestic=domestic,
cutoff=cutoff,
display=display)
# finishing touches
ap_fig.update_layout(width=900, height=500, # dimensions
title_text = 'Exportations from Asia to Europe', # changing title
font_family = 'PT Sans Narrow'
)
ap_fig.show()
sankey#
Sankey diagrams are designed to show the relative volume of flow from sources to targets. In this example, we use H1N1 predictions from 2010 and show the origin and destination countries for cases travelling from Asia to Europe.
Parameters#
client (bigquery.Client): BigQuery client object.
table_name (str): BigQuery table name containing importation data in ‘dataset.table’ form.
reference_table (str): BigQuery table name containing reference table in ‘dataset.table’ form.
date_range (list of str): Start and end date, inclusive. Dates should be formatted as they are in your original table.
source_geo_level (str): The name of a column from the reference table. The geographical level used to determine what sources are included.
target_geo_level (str): The name of a column from the reference table. The geographical level used to determine what targets are included.
source_values (str or listlike or None, optional): The source(s) to be included. A value or subset of values from the source_geo_level column. If None, then all values will be included. Defaults to None.
target_values (str or listlike or None, optional): The target(s) to be included. A value or subset of values from the target_geo_level column. If None, then all values will be included. Default to None.
source_column (str, optional): Name of column in original table containing source identifier. Defaults to ‘source_basin’.
target_column (str, optional): Name of column in original table containing target identifier. Defaults to ‘target_basin’.
reference_column (str, optional): Name of column in original table containing the geography corresponding to data in source_column and target_column. Defaults to ‘basin_id’.
source_resolution (str or None, optional): The name of a column from the reference table. Desired geographical resolution for source nodes. If None, then source_geo_level will be used. Defaults to None.
target_resolution (str or None, optional): The name of a column from the reference table. Desired geographical resolution for target nodes. If None, then target_geo_level will be used. Defaults to None.
domestic (bool, optional): Whether or not domestic cases will be included. Defaults to True. Note: can produce unexpected results.
cutoff (float, optional): From 0 to 1, inclusive. All sources or targets that contribute below this percentage of cases will be grouped into an ‘Other’ category. Set to 0 for no ‘Other’ category. Defaults to 0.05.
n_sources (int, optional): The maximum number of source nodes in the sankey. Only this number of sources minus one will show, the rest will be aggregated into ‘Other’ regardless of cutoff. Must be a positive integer.
n_targets (int, optional): The maximum number of target nodes in the sankey. Only this number of targets minus one will show, the rest will be aggregated into ‘Other’ regardless of cutoff. Must be a positive integer.
value (str, optional): Name of column in the original table containing the importation value to be analyzed. Defaults to ‘importations’.
Returns#
fig (plotly.graph_objects.Figure): Plotly Figure containing visualization.
Example#
table_name = "net-data-viz-handbook.importation_data.h1n1_proper_simple_agg"
reference_table = 'reference.gleam-geo-map'
source_geo_level = "continent_label" # This could also be "region_id" or other levels
source_values = ['Asia']
source_resolution = "country_name"
target_geo_level = "continent_label" # Could be "country_id", "region_label", etc.
target_values = ["Europe"] # Regions to filter on
target_resolution = 'region_label'
date_range = ["2009W40", "2009W43"] # The date range for the data
cutoff = 0.08 # Threshold for categorizing regions
domestic = True
s_fig = ei.sankey(client=client,
table_name=table_name,
reference_table=reference_table,
source_geo_level=source_geo_level,
target_geo_level=target_geo_level,
source_values=source_values,
target_values=target_values,
source_resolution=source_resolution,
target_resolution=target_resolution,
date_range=date_range,
cutoff=cutoff,
domestic=domestic,
n_sources=5,
n_targets=None)
# finishing touches
s_fig.update_layout(width=400, height=600,
font_family='PT Sans Narrow',
title='October 2009: Asia to Europe')
s_fig.show()
relative_risk#
Sankey diagrams are designed to show a the relative volume of flow from sources to targets.
Parameters#
client (bigquery.Client): BigQuery client object.
table_name (str): BigQuery table name containing importation data in ‘dataset.table’ form.
reference_table (str): BigQuery table name containing reference table in ‘dataset.table’ form.
date_range (list of str): Start and end date, inclusive. Dates should be formatted as they are in your original table.
source_geo_level (str): The name of a column from the reference table. The geographical level used to determine what sources are included.
target_geo_level (str): The name of a column from the reference table. The geographical level used to determine what targets are included.
source_values (str or listlike or None, optional): The source(s) to be included. A value or subset of values from the source_geo_level column. If None, then all values will be included. Defaults to None.
target_values (str or listlike or None, optional): The target(s) to be included. A value or subset of values from the target_geo_level column. If None, then all values will be included. Default to None.
source_column (str, optional): Name of column in original table containing source identifier. Defaults to ‘source_basin’.
target_column (str, optional): Name of column in original table containing target identifier. Defaults to ‘target_basin’.
reference_column (str, optional): Name of column in original table containing the geography corresponding to data in source_column and target_column. Defaults to ‘basin_id’.
output resolution
domestic (bool, optional): Whether or not domestic cases will be included. Defaults to True. Note: can produce unexpected results.
cutoff (float, optional): From 0 to 1, inclusive. All sources or targets that contribute below this percentage of cases will be grouped into an ‘Other’ category. Set to 0 for no ‘Other’ category. Defaults to 0.05.
n (int, optional): The maximum number of bars in the relative risk chart. Only this number of targets minus one will show, the rest will be aggregated into ‘Other’ regardless of cutoff. Must be a positive integer.
value (str, optional): Name of column in the original table containing the importation value to be analyzed. Defaults to ‘importations’.
Returns#
fig (plotly.graph_objects.Figure): Plotly Figure containing visualization.
Example#
# uses same parameter set as sankey!
rr_fig = ei.relative_risk(client=client,
table_name=table_name,
reference_table=reference_table,
source_geo_level=source_geo_level,
target_geo_level=target_geo_level,
source_values=source_values,
target_values=target_values,
date_range=date_range,
cutoff=0.0, # changing cutoff to 0
n=15, # limiting to 15 rows
output_resolution='country_name') # changing output resolution from region to country
# finishing touches
rr_fig.update_layout(width=700, height=500, font_family='PT Sans Narrow')
rr_fig.show()
Fetching data from importation plots#
epidemic_intelligence.importation_plots offers functions for extracting the data from importation plots to a pandas dataframe. These functions have one parameter fig, which is the plotly Figure generated by the graphing functions.
fetch_area_plot_data#
Parameters#
fig (plotly.graph_objects.Figure): Figure objected returned by area_plot.
Returns#
df (pandas.DataFrame): pandas dataframe containing data.
Example#
ap_df = ei.fetch_area_plot_data(ap_fig)
ap_df.head()
| source | exportations | |
|---|---|---|
| 2009W18 | Other | 1 |
| 2009W19 | Other | 1 |
| 2009W20 | Other | 3 |
| 2009W21 | Other | 8 |
| 2009W22 | Other | 19 |
fetch_sankey_data#
Parameters#
fig (plotly.graph_objects.Figure): Figure objected returned by sankey.
Returns#
df (pandas.DataFrame): pandas dataframe containing data.
Example#
s_df = ei.fetch_sankey_data(s_fig)
s_df.head()
| source | target | value | |
|---|---|---|---|
| 0 | Other | Western Europe | 0.109204 |
| 1 | Cyprus | Northern Europe | 0.034391 |
| 2 | Cyprus | Western Europe | 0.014957 |
| 3 | Cyprus | Eastern Europe | 0.026480 |
| 4 | Cyprus | Southern Europe | 0.018897 |
fetch_relative_risk_data#
Parameters#
fig (plotly.graph_objects.Figure): Figure objected returned by relative_risk.
Returns#
df (pandas.DataFrame): pandas dataframe containing data.
Example#
rr_df = ei.fetch_relative_risk_data(rr_fig)
rr_df.head()
| relative_risk_of_importation | |
|---|---|
| Germany | 0.188166 |
| United Kingdom | 0.160671 |
| Russian Federation | 0.136012 |
| Other | 0.132609 |
| France | 0.072842 |